Reinforcement learning under circumstances beyond its control
نویسنده
چکیده
Decision theory addresses the task of choosing an action; it provides robust decision-making criteria that support decision-making under conditions of uncertainty or risk. Decision theory has been applied to produce reinforcement learning algorithms that manage uncertainty in state-transitions. However, performance when there is uncertainty regarding the selection of future actions must also be considered, since reinforcement learning tasks are multiple-step decision problems. This work proposes β-pessimistic Q-learning—a reinforcement learning algorithm that does not assume complete control.
منابع مشابه
Learning to control dynamic systems via associative reinforcement learning
and the lack of explicit instructional information about how to perform a given control task. Under these circumstances, techniques developed by arti cial intelligence researchers for \learning from examples," including the \supervised learning" techniques studied by neural network researchers, are not directly applicable because these techniques are based on the availability of training inform...
متن کاملReinforcement Learning Methods for Continuous-Time Markov Decision Problems
Semi-Markov Decision Problems are continuous time generalizations of discrete time Markov Decision Problems. A number of reinforcement learning algorithms have been developed recently for the solution of Markov Decision Problems, based on the ideas of asynchronous dynamic programming and stochastic approximation. Among these are TD(,x), Q-Iearning, and Real-time Dynamic Programming. After revie...
متن کاملLearning Strategies for Mid-Level Robot Control: Some Preliminary Considerations and Experiments
Versatile robots will need to be programmed, of course. But beyond explicit programming by a programmer, they will need to be able to plan how to perform new tasks and how to perform old tasks under new circumstances. They will also need to be able to learn. In this article, I concentrate on two types of learning, namely supervised learning and reinforcement learning of robot control programs. ...
متن کاملThe curse of planning: dissecting multiple reinforcement-learning systems by taxing the central executive.
A number of accounts of human and animal behavior posit the operation of parallel and competing valuation systems in the control of choice behavior. In these accounts, a flexible but computationally expensive model-based reinforcement-learning system has been contrasted with a less flexible but more efficient model-free reinforcement-learning system. The factors governing which system controls ...
متن کاملReinforcement Learning by Comparing Immediate Reward
This paper introduces an approach to Reinforcement Learning Algorithm by comparing their immediate rewards using a variation of Q-Learning algorithm. Unlike the conventional Q-Learning, the proposed algorithm compares current reward with immediate reward of past move and work accordingly. Relative reward based Q-learning is an approach towards interactive learning. Q-Learning is a model free re...
متن کامل